Create prediction platform to test altitude and distance indices#3
Create prediction platform to test altitude and distance indices#3BrandonTrigueros wants to merge 10 commits intomainfrom
Conversation
…es (#2) Implements a comprehensive testing platform to evaluate if altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data. ## Data Source - Copenhagenize Index 2025 Edition (EIT Urban Mobility) - Source: https://copenhagenizeindex.eu/ - Official name: 'The Global Ranking of Bicycle-Friendly Cities' - Top 30 cities with scores from 50.3 to 71.1 ## Components Added ### Data Collection - retrieve_data.py: Script to fetch/update Copenhagenize Index data - copenhagenize_index_2025.csv: Reference dataset (30 cities) ### Index Calculations - calculate_indices.py: Functions to compute A_i and D_i * altitude_index: Measures hilliness using OSM elevation data * distance_index: Measures network connectivity/compactness ### Analysis Platform - prediction_platform.py: Main statistical analysis tool * Pearson/Spearman correlation tests * Linear regression modeling * Visualization generation * CSV export of results - demo_platform.py: Simplified demo with synthetic data - verify_structure.py: Project structure validation ### Documentation - Comprehensive README with methodology and usage - CHANGELOG with development history - requirements-platform.txt for dependencies ## Hypotheses Tested 1. H1: Lower A_i → Better for cycling (flat terrain) 2. H2: D_i closer to 1 → Better for cycling (direct routes) ## Technical Stack Python 3.12+, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, osmnx, networkx, geopandas Resolves #2
- Tests data file integrity (CSV structure, data types) - Validates analysis platform structure and functions - Tests hypothesis logic with mock data - Confirms documentation completeness Results: 4/5 tests passing - ✓ Data file (2025 Copenhagenize Index) - ✓ Analysis platform structure - ✓ Documentation files - ✓ Hypothesis testing logic - ⚠ Index functions (requires OSMnx installation) Note: Full testing with OSMnx requires dependencies installation
- Add results/ to .gitignore (generated outputs) - Verified demo platform works successfully
- Fixed f-string formatting error in prediction_platform.py - Created run_analysis_skip_problematic.py to use successfully calculated cities - Analyzed 13 cities from Copenhagenize Index 2025 - Added cache/ directories to .gitignore (OSMnx temporary files) Results: - H1 (Altitude): SUPPORTED (r=-0.604, p=0.0288) Lower altitude index correlates with higher bicycle scores - H2 (Distance): NOT SIGNIFICANT (r=-0.475, p=0.101) No significant relationship found Skipped cities with problematic areas (Quebec: 900x size limit)
The bikenv/ package was not being imported or used anywhere. All functionality is in scripts/ and analysis/ directories.
…ging Features: - Added 5-minute timeout per city to skip problematic large areas (like Québec) - Enhanced progress logging with flush=True for immediate output visibility - Show [X/Y] progress counter for each city - Display elapsed time for each city calculation - Better error messages distinguishing timeouts, area limits, and other errors - Improved exception handling with KeyboardInterrupt support - Main function now validates minimum data requirements Changes: - Removed run_analysis_skip_problematic.py (no longer needed) - Old results cleared (will be regenerated with improved platform) The platform now handles edge cases gracefully and provides clear feedback during the ~10-15 minute analysis process.
Resolves #2 Critical fix: - calculate_indices_for_city now exits early if altitude calculation fails - Prevents attempting distance calculation after timeout (was causing 67min hang) - Québec timeout now properly stops after 5 minutes instead of continuing Cleanup: - Removed setup.py (project is no longer an installable package) - Updated README to document new structure as scripts-based analysis platform - All dependencies managed via requirements-platform.txt This fixes the issue where Québec timed out on altitude (5min) but then tried to calculate distance anyway, causing another 67-minute hang before user interruption.
Results from 13 successfully analyzed cities: - H1 (Altitude): CONFIRMED (r=-0.657, p=0.0147, R²=0.431) - H2 (Distance): NOT SIGNIFICANT (r=-0.483, p=0.095) Files: - results/cities_with_indices.csv (13 cities with calculated indices) - results/statistical_results.csv (hypothesis test results) - results/hypothesis_testing_results.png (visualizations) - results/altitude_index_plot.png (individual plot) Note: Québec skipped due to area size (900x Overpass limit)
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive testing platform to evaluate whether altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025. The analysis of 13 cities found significant correlation between flat terrain and better cycling conditions (p=0.0147), while network connectivity showed no statistically significant relationship.
Key changes:
- Implements altitude and distance index calculation functions using OpenStreetMap data
- Creates statistical analysis platform with correlation tests and linear regression
- Adds comprehensive documentation and data from Copenhagenize Index 2025
- Refactors from package structure to scripts-based platform
Reviewed changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| setup.py | Removed obsolete package setup file as project is now scripts-based |
| bikenv/module.py, bikenv/_api.py, bikenv/init.py | Removed placeholder package module in favor of functional scripts |
| scripts/retrieve_data.py | Manual data entry script for Copenhagenize Index 2025 with 30 top cities |
| scripts/calculate_indices.py | Core calculation functions for altitude (hilliness) and distance (connectivity) indices using OSMnx |
| analysis/prediction_platform.py | Main statistical analysis platform with timeout handling, correlation tests, and visualization |
| analysis/demo_platform.py | Demo version using synthetic data for testing without API dependencies |
| analysis/README.md | Comprehensive documentation of methodology, usage, and results interpretation |
| data/copenhagenize_index_2025.csv | Reference dataset with 30 bicycle-friendly cities and their scores |
| results/statistical_results.csv | Output file with correlation and regression statistics |
| results/cities_with_indices.csv | Calculated indices for analyzed cities |
| requirements-platform.txt | Python dependencies for the analysis platform |
| README.md | Updated project overview reflecting new scripts-based structure |
| CHANGELOG.md | Development history documenting implementation details |
| .gitignore | Added results/ and cache/ directories to ignore list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| print(f"Error calculating distance index for {city_name}: {e}") | ||
| return None |
There was a problem hiding this comment.
The error handling also returns None without raising an exception. This is the same pattern as in calculate_altitude_index. Consider using a more explicit error handling strategy that provides better diagnostic information to the caller.
| print(f"Error calculating distance index for {city_name}: {e}") | |
| return None | |
| error_message = f"Error calculating distance index for {city_name}: {e}" | |
| print(error_message) | |
| raise RuntimeError(error_message) from e |
| return cities_data | ||
|
|
||
|
|
||
| def save_to_csv(data: List[Dict], output_file: str = "../data/copenhagenize_index_2025.csv"): |
There was a problem hiding this comment.
The relative path '../data/copenhagenize_index_2025.csv' assumes the script is run from the scripts/ directory. Consider using file-based path resolution or adding validation that the file exists with a helpful error message.
|
|
||
| 1. **Sample Size**: Analysis uses 15 cities for computational efficiency | ||
| 2. **API Dependencies**: Requires OpenStreetMap data access | ||
| 3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations |
There was a problem hiding this comment.
The documentation mentions "Elevation data may require Google Elevation API key for accurate altitude calculations" but the code in calculate_indices.py actually uses the Open Topo Data API (which is free and doesn't require an API key). This is misleading and should be corrected to accurately reflect the implementation.
| 3. **Elevation Data**: May require Google Elevation API key for accurate altitude calculations | |
| 3. **Elevation Data**: Uses the Open Topo Data API (no API key required); accuracy depends on its data coverage and resolution |
| os.makedirs('../results', exist_ok=True) | ||
| output_path = '../results/demo_results.png' | ||
| plt.savefig(output_path, dpi=300, bbox_inches='tight') | ||
| print(f"\n✓ Saved visualization to: {output_path}") | ||
|
|
||
| # Save data | ||
| output_csv = '../results/demo_cities_with_indices.csv' |
There was a problem hiding this comment.
The relative paths '../data/copenhagenize_index_2025.csv', '../results/demo_results.png', and '../results/demo_cities_with_indices.csv' assume the script is run from the analysis/ directory. Use file-based path resolution for robustness.
| os.makedirs('../results', exist_ok=True) | |
| output_path = '../results/demo_results.png' | |
| plt.savefig(output_path, dpi=300, bbox_inches='tight') | |
| print(f"\n✓ Saved visualization to: {output_path}") | |
| # Save data | |
| output_csv = '../results/demo_cities_with_indices.csv' | |
| results_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'results') | |
| os.makedirs(results_dir, exist_ok=True) | |
| output_path = os.path.join(results_dir, 'demo_results.png') | |
| plt.savefig(output_path, dpi=300, bbox_inches='tight') | |
| print(f"\n✓ Saved visualization to: {output_path}") | |
| # Save data | |
| output_csv = os.path.join(results_dir, 'demo_cities_with_indices.csv') |
| # Get the road network with elevation data | ||
| G = ox.graph_from_place(query, network_type="bike") | ||
|
|
||
| # Add elevation data using free Open Topo Data API (no key needed) |
There was a problem hiding this comment.
The function name ox.add_node_elevations_google is misleading when used with the Open Topo Data API. While the configuration at line 14-15 correctly sets the URL to opentopodata.org, the function name still references Google. This could confuse future maintainers about which API is actually being used. Consider adding a comment explaining this discrepancy or checking if OSMnx has a more appropriately named function for custom elevation APIs.
| # Add elevation data using free Open Topo Data API (no key needed) | |
| # Add elevation data using free Open Topo Data API (no key needed). | |
| # Note: ox.add_node_elevations_google uses ox.settings.elevation_url_template, | |
| # which we configured above to point to the Open Topo Data endpoint, so | |
| # despite the function name, this does NOT call the Google Elevation API. |
|
|
||
| print(f"✓ Saved individual plot: altitude_index_plot.png") | ||
|
|
||
| plt.show() |
There was a problem hiding this comment.
The plt.show() call at line 386 is problematic in non-interactive environments (e.g., running on a server, in CI/CD, or without a display). This will cause the script to hang or fail. Consider making this call optional via a command-line flag or environment variable, or wrapping it in a try-except block to handle environments where display is not available.
| plt.show() | |
| # Optionally show plots in interactive environments. | |
| # Controlled via environment variable to avoid blocking in headless/CI environments. | |
| if os.environ.get("PREDICTION_PLATFORM_SHOW_PLOTS", "").lower() in ("1", "true", "yes"): | |
| try: | |
| plt.show() | |
| except Exception as e: | |
| print(f"⚠ Unable to display plots (plt.show failed): {e}") |
| def save_results(df: pd.DataFrame, altitude_results: dict, distance_results: dict, | ||
| output_dir: str = '../results'): |
There was a problem hiding this comment.
Similarly, the relative path '../results' assumes execution from the analysis/ directory. This will create results in unexpected locations if the script is run from elsewhere. Use file-based path resolution for consistency.
| print("Creating Visualizations") | ||
| print("="*70) | ||
|
|
||
| fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) |
There was a problem hiding this comment.
Variable fig is not used.
| fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) | |
| _, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) |
| """ | ||
|
|
||
| import osmnx as ox | ||
| import networkx as nx |
There was a problem hiding this comment.
Import of 'nx' is not used.
| import networkx as nx |
| Source: https://copenhagenizeindex.eu/ | ||
| """ | ||
|
|
||
| import re |
There was a problem hiding this comment.
Import of 're' is not used.
| import re |
Resolves #2
Implements a comprehensive testing platform to evaluate if altitude_index (A_i) and distance_index (D_i) can predict cycling friendliness using real-world data from the Copenhagenize Index 2025.
Results Summary
Analysis of 13 cities completed successfully:
Hypothesis 1: Lower A_i → Better for cycling CONFIRMED
Hypothesis 2: D_i closer to 1 → Better for cycling NOT SIGNIFICANT
Data Source
Components Added
Data Collection (
scripts/)retrieve_data.py: Fetch and structure Copenhagenize Index datacalculate_indices.py: Calculate A_i (altitude) and D_i (distance) indicesAnalysis Platform (
analysis/)prediction_platform.py: Main statistical analysis toolData & Results
data/copenhagenize_index_2025.csv: Reference datasetresults/: Generated outputs (CSV, PNG visualizations)Documentation
requirements-platform.txt: Python dependenciesTechnical Improvements
Known Limitations
Technical Stack
Python 3.12+, OSMnx, pandas, numpy, matplotlib, seaborn, scipy, scikit-learn, networkx, geopandas
Files Changed
scripts/,analysis/,data/,results/directoriesbikenv/package (unused),setup.py(obsolete)